Skip to content

fix(cli): add missing subprocess.run() timeouts in doctor and status#3693

Closed
dieutx wants to merge 1 commit intoNousResearch:mainfrom
dieutx:fix/cli-subprocess-timeouts
Closed

fix(cli): add missing subprocess.run() timeouts in doctor and status#3693
dieutx wants to merge 1 commit intoNousResearch:mainfrom
dieutx:fix/cli-subprocess-timeouts

Conversation

@dieutx
Copy link
Copy Markdown
Contributor

@dieutx dieutx commented Mar 29, 2026

Summary

4 subprocess.run() calls in CLI utilities have no timeout parameter and can hang indefinitely if the child process blocks. The rest of the CLI already uses timeouts (clipboard.py: 3-15s, banner.py: 5-10s, doctor.py npm audit: 30s) — these 4 are the only ones missing.

Same class of bug as #3469 (git/ripgrep subprocess calls hanging on large repos).

Root Cause

subprocess.run() defaults to timeout=None, meaning it waits forever. If the docker daemon is unresponsive, systemctl is waiting for D-Bus, or an SSH host is unreachable past its ConnectTimeout, hermes doctor and hermes status hang with no feedback.

Fix

  • hermes_cli/doctor.py:409docker info: add timeout=10
  • hermes_cli/doctor.py:429ssh ... echo ok: add timeout=15 (slightly above SSH's own ConnectTimeout=5)
  • hermes_cli/status.py:285systemctl --user is-active: add timeout=5
  • hermes_cli/status.py:296launchctl list: add timeout=5

Each call site catches subprocess.TimeoutExpired and treats it as failure, consistent with how non-zero return codes are already handled.

Tests

1 new AST-based regression test in tests/hermes_cli/test_subprocess_timeouts.py — parameterized across all 4 CLI modules that use subprocess.run(). Parses each file and asserts every call has a timeout keyword. Catches future regressions automatically.

11 pre-existing doctor/status tests + 4 new = 15 passed.

Add timeout parameters to 4 subprocess.run() calls that could hang
indefinitely if the child process blocks (e.g., unresponsive docker
daemon, systemctl waiting for D-Bus):

- doctor.py: docker info (timeout=10), ssh check (timeout=15)
- status.py: systemctl is-active (timeout=5), launchctl list (timeout=5)

Each call site now catches subprocess.TimeoutExpired and treats it as
a failure, consistent with how non-zero return codes are already handled.

Add AST-based regression test that verifies every subprocess.run() call
in CLI modules specifies a timeout keyword argument.
dlkakbs added a commit to dlkakbs/hermes-agent that referenced this pull request Mar 29, 2026
All subprocess.run() calls in hermes_cli/gateway.py lacked a timeout
parameter. If systemctl, launchctl, loginctl, wmic, or ps blocks
(e.g. D-Bus unavailable, WMI service stuck, launchd unresponsive),
hermes gateway start/stop/restart/status/install/uninstall hangs
indefinitely with no feedback to the user.

Timeout values applied:
- Lifecycle commands (start/stop/restart/enable/disable/daemon-reload,
  launchctl load/unload): timeout=30
- Status/query commands (is-active, loginctl show-user, launchctl list,
  systemctl status, journalctl, tail, ps aux, wmic): timeout=5-10
- loginctl enable-linger: timeout=10

For _is_service_running() and launchd_status(), TimeoutExpired is caught
explicitly and treated as not-running, matching how non-zero return codes
are already handled. All other call sites are either inside existing
try/except Exception blocks (find_gateway_pids, _enable_systemd_linger,
get_systemd_linger_status) or raise TimeoutExpired as a clear error
instead of hanging forever.

Same class of fix as NousResearch#3469 (context_references) and NousResearch#3693 (doctor/status).
@teknium1
Copy link
Copy Markdown
Contributor

Merged via #4009. Cherry-picked with authorship preserved, all 32 tests pass. The AST regression test is a nice touch — catches future missing timeouts automatically. Thanks @dieutx!

@teknium1 teknium1 closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants